General Summary

Row

# Cantiche

3

# Canti

100

# verses

14'233

# words

56'003

# distinct words

8'771

Row

INFO

The whole text has been pre-processed, in particular:

  • Lemmatisation

    • Lemmatisation is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word’s lemma, or dictionary form. For example:

      • good better best -> good
      • have has had -> have
  • StopWords removal

    • Some words (aka stopwords) are filtered out because too frequent and considered not important for the analysis, for example: “negl”, “stavano”, “faceva”, “una”, “con”
    • Stopwords list depends on analysis, so the stoplist can be changed.

GUIDES

Please refers to this site for mission and guidelines.

Row

WordCloud - Word frequency for whole Divina Commedia

# triplets per cantica

Row

Tables - Word frequency for whole Divina commedia

# verses per cantica

Focus per cantiche

Row

Inferno

Purgatorio

Paradiso

Row

Inferno

Purgatorio

Paradiso

TF-IDF per cantica

Row

INFO

In information retrieval, TF-IDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection.
The tf–idf value increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word, which helps to adjust for the fact that some words appear more frequently in general.

PERSONAL COMMENTS

TFIDF helps us see how word choice changes for each cantica.

In Paradiso the most important words are Cristo, Ridere (to laugh), Letiza (gladness) while in Inferno the most important words are Bolgia (bedlam), Pena (punishment), Fiera (in this case means obstruction). From these words you can already guess the ambient and feeling that distinguishes the canticles

Row

TF-IDF Inferno

TF-IDF Purgatorio

TF-IDF Paradiso